Skip to content

Conversation

@1145284121
Copy link

What does this PR do?

Implements #9
Add an implemention sequence parallel(deepspeed-ulysses) + SVG inferencce for cogvideo and hunyuanvideo.

Description

  • tested on A100/H100 + torch2.6 + xfuser0.4.1(from xDiT project),the inference speed can speed up 3.7-3.8x (4 GPU) / 7.4-7.5x (8 GPU)
  • Dense/SVG parallel results are completely consistent with single-gpu execution.
  • For more details, please refer to xDiT project and their blog.

Visualization

CogvideoX inference results(SVG single gpu):

output-cog-single.mp4

CogvideoX inference results(SVG + ulysses=2):

output-cog-uly2-0907.mp4

HunyuanVideo inference results(SVG single gpu):

hunyuan_output_svg_single.mp4

HunyuanVideo inference results(SVG + ulysses=4):

hunyuan_output_svg_step50_sp4.mp4

@haochengxi
Copy link
Collaborator

Thanks for the contribution! We will verify this modification, then merge it to our repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants